CBT Campus' Online Skills Training Courses.

Getting Started with Hive: Bucketing & Window Functions

Course Overview
implement bucketing for a Hive table and explore the structure of the table and bucket on HDFS
apply both bucketing and partitioning for a table and describe the structure of such a table on HDFS
extract further performance from Hive queries by sorting the contents of buckets
work with samples of a Hive table by dividing it into buckets
perform join operations on three or more tables by chaining the joins
implement a window function to calculate running totals on an ordered dataset
apply a window function within a partition of your dataset
apply bucketing of Hive tables to boost query performance and to use window functions

Learners explore how Apache Hive query executions can be optimized, including techniques such as bucketing data sets, in this Skillsoft Aspire course. Using windowing functions to extract meaningful insights from data is also covered. This 10-video course assumes previous work with partitions in Hive, as well as conceptual understanding of how buckets can improve query performance. Learners begin by focusing on how to use the bucketing technique to process big data efficiently. Then take a look at HDFS (Hadoop Distributed File System) by navigating to the shell of the Hadoop master node; from there, make use of the Hadoop fs-ls command to examine contents of the directory. Observe three subdirectories corresponding to three partitions based on the value of the category column. You will then explore how to combine both the partitioning as well as bucketing techniques to further improve query performance. Finally, learners will explore the concept of co-windowing, which helps users analyze a subset of ordered data, and then to see how this technique can be implemented in Hive.

Target

Prerequisites: none

Getting Started with Hive: Bucketing & Window Functions

Course Number:
it_dsgshvdj_06_enus

Getting Started with Hive: Bucketing & Window Functions

Course Overview
implement bucketing for a Hive table and explore the structure of the table and bucket on HDFS
apply both bucketing and partitioning for a table and describe the structure of such a table on HDFS
extract further performance from Hive queries by sorting the contents of buckets
work with samples of a Hive table by dividing it into buckets
perform join operations on three or more tables by chaining the joins
implement a window function to calculate running totals on an ordered dataset
apply a window function within a partition of your dataset
apply bucketing of Hive tables to boost query performance and to use window functions

Target

Prerequisites: none

Getting Started with Hive: Bucketing & Window Functions

Course Number:
it_dsgshvdj_06_enus

Getting Started with Hive: Bucketing & Window Functions

Course Overview
implement bucketing for a Hive table and explore the structure of the table and bucket on HDFS
apply both bucketing and partitioning for a table and describe the structure of such a table on HDFS
extract further performance from Hive queries by sorting the contents of buckets
work with samples of a Hive table by dividing it into buckets
perform join operations on three or more tables by chaining the joins
implement a window function to calculate running totals on an ordered dataset
apply a window function within a partition of your dataset
apply bucketing of Hive tables to boost query performance and to use window functions

Target

Prerequisites: none

Getting Started with Hive: Bucketing & Window Functions

Course Number:
it_dsgshvdj_06_enus

Getting Started with Hive: Bucketing & Window Functions

Course Overview
implement bucketing for a Hive table and explore the structure of the table and bucket on HDFS
apply both bucketing and partitioning for a table and describe the structure of such a table on HDFS
extract further performance from Hive queries by sorting the contents of buckets
work with samples of a Hive table by dividing it into buckets
perform join operations on three or more tables by chaining the joins
implement a window function to calculate running totals on an ordered dataset
apply a window function within a partition of your dataset
apply bucketing of Hive tables to boost query performance and to use window functions

Target

Prerequisites: none

Getting Started with Hive: Introduction

Course Number:
it_dsgshvdj_01_enus

Getting Started with Hive: Introduction

Course Overview
define what a data warehouse is and identify its characteristics
describe the functions served by relational databases and the features they offer
distinguish between Online Transaction Processing and Online Analytical Processing and identify the specific problems they are meant to solve
identify where Hive fits in the Hadoop ecosystem and how it simplifies working with Hadoop
describe the architecture of Hive and the functions served by HiveServer and the Metastore
identify the services and features offered by AWS, Azure, and GCP to run Hadoop and Hive on their infrastructure
describe the different primitive and complex data types available in Hive
compare managed and external tables in Hive and how they relate to the underlying data
contrast OLTP and OLAP systems, identify major components of Hadoop, explore Hive benefits for data analysis

This 9-video Skillsoft Aspire course focuses solely on theory and involves no programming or query execution. Learners begin by examining what a data warehouse is, and how it differs from a relational database, important because Apache Hive is primarily a data warehouse, despite giving a SQL-like interface to query data. Hive facilitates work on very large data sets, stored as files in the Hadoop Distributed File System, and lets users perform operations in parallel on data in these files by effectively transforming Hive queries into MapReduce operations. Next, you will hear about types of data and operations which data warehouses and relational databases handle, before moving on to basic components of the Hadoop architecture. Finally, the course discusses features of Hive making it popular among data analysts. The concluding exercise recalls differences between online transaction processing and online analytical processing systems, asking learners to identify Hadoop’s three major components; list Hadoop offerings on three major cloud platforms (AWS, Microsoft Azure, and Google Cloud Platform); and list benefits of Hive for data analysts.

Target

Prerequisites: none

Getting Started with Hive: Introduction

Course Number:
it_dsgshvdj_01_enus

Getting Started with Hive: Introduction

Course Overview
define what a data warehouse is and identify its characteristics
describe the functions served by relational databases and the features they offer
distinguish between Online Transaction Processing and Online Analytical Processing and identify the specific problems they are meant to solve
identify where Hive fits in the Hadoop ecosystem and how it simplifies working with Hadoop
describe the architecture of Hive and the functions served by HiveServer and the Metastore
identify the services and features offered by AWS, Azure, and GCP to run Hadoop and Hive on their infrastructure
describe the different primitive and complex data types available in Hive
compare managed and external tables in Hive and how they relate to the underlying data
contrast OLTP and OLAP systems, identify major components of Hadoop, explore Hive benefits for data analysis

Target

Prerequisites: none

Getting Started with Hive: Introduction

Course Number:
it_dsgshvdj_01_enus

Getting Started with Hive: Introduction

Course Overview
define what a data warehouse is and identify its characteristics
describe the functions served by relational databases and the features they offer
distinguish between Online Transaction Processing and Online Analytical Processing and identify the specific problems they are meant to solve
identify where Hive fits in the Hadoop ecosystem and how it simplifies working with Hadoop
describe the architecture of Hive and the functions served by HiveServer and the Metastore
identify the services and features offered by AWS, Azure, and GCP to run Hadoop and Hive on their infrastructure
describe the different primitive and complex data types available in Hive
compare managed and external tables in Hive and how they relate to the underlying data
contrast OLTP and OLAP systems, identify major components of Hadoop, explore Hive benefits for data analysis

Target

Prerequisites: none

Getting Started with Hive: Introduction

Course Number:
it_dsgshvdj_01_enus

Getting Started with Hive: Introduction

Course Overview
define what a data warehouse is and identify its characteristics
describe the functions served by relational databases and the features they offer
distinguish between Online Transaction Processing and Online Analytical Processing and identify the specific problems they are meant to solve
identify where Hive fits in the Hadoop ecosystem and how it simplifies working with Hadoop
describe the architecture of Hive and the functions served by HiveServer and the Metastore
identify the services and features offered by AWS, Azure, and GCP to run Hadoop and Hive on their infrastructure
describe the different primitive and complex data types available in Hive
compare managed and external tables in Hive and how they relate to the underlying data
contrast OLTP and OLAP systems, identify major components of Hadoop, explore Hive benefits for data analysis

Target

Prerequisites: none

Getting Started with Hive: Loading and Querying Data

Course Number:
it_dsgshvdj_02_enus

Getting Started with Hive: Loading and Querying Data

Course Overview
use the Google Cloud Platform's Dataproc service to provision a Hadoop cluster
define and create a simple table in Hive using the Beeline client
load a few rows of data into a table and query it with simple select statements
run Hive queries from the shell of a host where a Hive client is installed
define and run a join query involving two related tables
describe the structure of the Hive Metastore on the Hadoop Distributed File System (HDFS)
create, load data into, and query an external table in Hive and contrast it with a Hive-managed table
use the alter table statement to change the definition of a Hive table
work with temporary tables that are only valid for a single Hive session and recognize how they differ from regular tables
populate Hive tables with data in files on both HDFS and the file system of the Hive client
load data into multiple tables from the contents of another table
use the Hadoop shell to execute Hive query scripts and work with Hive tables

Among the market’s most popular data warehouses used for data science, Apache Hive simplifies working with large data sets in files by representing them as tables. In this 12-video Skillsoft Aspire course, learners explore how to create, load, and query Hive tables. For this hands-on course, learners should have a conceptual understanding of Hive and its basic components, and prior experience with querying data from tables using SQL (structured query language) and with using the command line. Key concepts covered include cluster, joining tables, and modifying tables. Demonstrations covered include using the Beeline client for Hive for simple operations; creating tables, loading them with data, and then running queries against them. Only tables with primitive data types are used here, with data loaded into these tables from HDFS (Hadoop Distributed File System) file system and local machines. Learners will work with Hive metastore and temporary tables, and how they can be used. You will become familiar with basics of using the Hive query language and quite comfortable working with HDFS.

Target

Prerequisites: none

Getting Started with Hive: Loading and Querying Data

Course Number:
it_dsgshvdj_02_enus

Getting Started with Hive: Loading and Querying Data

Course Overview
use the Google Cloud Platform's Dataproc service to provision a Hadoop cluster
define and create a simple table in Hive using the Beeline client
load a few rows of data into a table and query it with simple select statements
run Hive queries from the shell of a host where a Hive client is installed
define and run a join query involving two related tables
describe the structure of the Hive Metastore on the Hadoop Distributed File System (HDFS)
create, load data into, and query an external table in Hive and contrast it with a Hive-managed table
use the alter table statement to change the definition of a Hive table
work with temporary tables that are only valid for a single Hive session and recognize how they differ from regular tables
populate Hive tables with data in files on both HDFS and the file system of the Hive client
load data into multiple tables from the contents of another table
use the Hadoop shell to execute Hive query scripts and work with Hive tables

Target

Prerequisites: none

Getting Started with Hive: Loading and Querying Data

Course Number:
it_dsgshvdj_02_enus

Getting Started with Hive: Loading and Querying Data

Course Overview
use the Google Cloud Platform's Dataproc service to provision a Hadoop cluster
define and create a simple table in Hive using the Beeline client
load a few rows of data into a table and query it with simple select statements
run Hive queries from the shell of a host where a Hive client is installed
define and run a join query involving two related tables
describe the structure of the Hive Metastore on the Hadoop Distributed File System (HDFS)
create, load data into, and query an external table in Hive and contrast it with a Hive-managed table
use the alter table statement to change the definition of a Hive table
work with temporary tables that are only valid for a single Hive session and recognize how they differ from regular tables
populate Hive tables with data in files on both HDFS and the file system of the Hive client
load data into multiple tables from the contents of another table
use the Hadoop shell to execute Hive query scripts and work with Hive tables

Target

Prerequisites: none

Getting Started with Hive: Loading and Querying Data

Course Number:
it_dsgshvdj_02_enus

Getting Started with Hive: Loading and Querying Data

Course Overview
use the Google Cloud Platform's Dataproc service to provision a Hadoop cluster
define and create a simple table in Hive using the Beeline client
load a few rows of data into a table and query it with simple select statements
run Hive queries from the shell of a host where a Hive client is installed
define and run a join query involving two related tables
describe the structure of the Hive Metastore on the Hadoop Distributed File System (HDFS)
create, load data into, and query an external table in Hive and contrast it with a Hive-managed table
use the alter table statement to change the definition of a Hive table
work with temporary tables that are only valid for a single Hive session and recognize how they differ from regular tables
populate Hive tables with data in files on both HDFS and the file system of the Hive client
load data into multiple tables from the contents of another table
use the Hadoop shell to execute Hive query scripts and work with Hive tables

Target

Prerequisites: none

Getting Started with Hive: Optimizing Query Executions

Course Number:
it_dsgshvdj_04_enus

Getting Started with Hive: Optimizing Query Executions

Course Overview
recognize how Hive translates queries to Hadoop MapReduce operations
identify the different options available in Hive to optimize query execution
recall how partitioning of a dataset can help queries run efficiently and identify the types of partitioning available in Hive
specify how bucketing improves query performance and compare it with partitioning a dataset
identify how to join tables in Hive to ensure the best performance of your query
work with techniques to improve performance and work with partitioning, bucketing and structured queries

In this 7-video Skillsoft Aspire course, learners can explore optimizations allowing Apache Hive to handle parallel processing of data, while users can still contribute to improving query performance. For this course, learners should have previous experience with Hive and familiarity with querying big data for analysis purposes. The course focuses only on concepts; no queries are run. Learners begin to understand how to optimize query executions in Hive, beginning with exploring different options available in Hive to query data in an optimal manner. Discuss how to split data into smaller chunks, specifically, partitioning and bucketing, so that queries need not scan full data sets each time. Hive truly democratizes access to data stored in a Hadoop cluster, eliminating the need to know MapReduce to process cluster data, and makes data accessible using the Hive query language. All files in Hadoop are exposed in the form of tables. Watch demonstrations of structuring queries to reduce numbers of map reduce operations generated by Hive, and speeding up query executions. Other concepts covered include partitioning, bucketing, and joins.

Target

Prerequisites: none

Getting Started with Hive: Optimizing Query Executions

Course Number:
it_dsgshvdj_04_enus

Getting Started with Hive: Optimizing Query Executions

Course Overview
recognize how Hive translates queries to Hadoop MapReduce operations
identify the different options available in Hive to optimize query execution
recall how partitioning of a dataset can help queries run efficiently and identify the types of partitioning available in Hive
specify how bucketing improves query performance and compare it with partitioning a dataset
identify how to join tables in Hive to ensure the best performance of your query
work with techniques to improve performance and work with partitioning, bucketing and structured queries

Target

Prerequisites: none

Getting Started with Hive: Optimizing Query Executions

Course Number:
it_dsgshvdj_04_enus

Getting Started with Hive: Optimizing Query Executions

Course Overview
recognize how Hive translates queries to Hadoop MapReduce operations
identify the different options available in Hive to optimize query execution
recall how partitioning of a dataset can help queries run efficiently and identify the types of partitioning available in Hive
specify how bucketing improves query performance and compare it with partitioning a dataset
identify how to join tables in Hive to ensure the best performance of your query
work with techniques to improve performance and work with partitioning, bucketing and structured queries

Target

Prerequisites: none

Getting Started with Hive: Optimizing Query Executions

Course Number:
it_dsgshvdj_04_enus

Getting Started with Hive: Optimizing Query Executions

Course Overview
recognize how Hive translates queries to Hadoop MapReduce operations
identify the different options available in Hive to optimize query execution
recall how partitioning of a dataset can help queries run efficiently and identify the types of partitioning available in Hive
specify how bucketing improves query performance and compare it with partitioning a dataset
identify how to join tables in Hive to ensure the best performance of your query
work with techniques to improve performance and work with partitioning, bucketing and structured queries

Target

Prerequisites: none

Getting Started with Hive: Optimizing Query Executions with Partitioning

Course Number:
it_dsgshvdj_05_enus

Getting Started with Hive: Optimizing Query Executions with Partitioning

Course Overview
use the Google Cloud Platform's Dataproc service to provision a Hadoop cluster. Not required if you already have a Hadoop environment set up with Hive
define a table which will contain data partitioned based on the value in one of its columns
insert data into partitions of a Hive table and explore the partition and its data on HDFS
load data into table partitions from files
create and populate partitions in an external table
alter the definition of a partition to modify its contents
define and work with dynamic partitions on your Hive tables
configure a table to use more than one column to define partitions and explore the partition on HDFS
use partitioning to boost query performance in HDFS

Continue to explore the versatility of Apache Hive, among today’s most popular data warehouses, in this 10-video Skillsoft Aspire course. Learners are shown ways to optimize query executions, including the powerful technique of partitioning data sets. The hands-on course assumes previous work with Hive tables using the Hive query language and in processing complex data types, along with theoretical understanding of improving query performance by partitioning very large data sets. Demonstrations focus on basics of partitioning and how to create partitions and load data into them. Learners work with both Hive-managed tables and external tables to see how partitioning works for each; then watch navigating to the shell of the Hadoop master node, and creating new directories in the Hadoop file system. Observe dynamic partitioning of tables and how this simplifies loading of data into partitions. Finally, you explore how using multiple columns in a table can partition data within it. During this course, learners will acquire a sound understanding of how exactly large data sets can be partitioned into smaller chunks, improving query performance.

Target

Prerequisites: none

Getting Started with Hive: Optimizing Query Executions with Partitioning

Course Number:
it_dsgshvdj_05_enus

Getting Started with Hive: Optimizing Query Executions with Partitioning

Course Overview
use the Google Cloud Platform's Dataproc service to provision a Hadoop cluster. Not required if you already have a Hadoop environment set up with Hive
define a table which will contain data partitioned based on the value in one of its columns
insert data into partitions of a Hive table and explore the partition and its data on HDFS
load data into table partitions from files
create and populate partitions in an external table
alter the definition of a partition to modify its contents
define and work with dynamic partitions on your Hive tables
configure a table to use more than one column to define partitions and explore the partition on HDFS
use partitioning to boost query performance in HDFS

Target

Prerequisites: none

Getting Started with Hive: Optimizing Query Executions with Partitioning

Course Number:
it_dsgshvdj_05_enus

Getting Started with Hive: Optimizing Query Executions with Partitioning

Course Overview
use the Google Cloud Platform's Dataproc service to provision a Hadoop cluster. Not required if you already have a Hadoop environment set up with Hive
define a table which will contain data partitioned based on the value in one of its columns
insert data into partitions of a Hive table and explore the partition and its data on HDFS
load data into table partitions from files
create and populate partitions in an external table
alter the definition of a partition to modify its contents
define and work with dynamic partitions on your Hive tables
configure a table to use more than one column to define partitions and explore the partition on HDFS
use partitioning to boost query performance in HDFS

Target

Prerequisites: none

Getting Started with Hive: Optimizing Query Executions with Partitioning

Course Number:
it_dsgshvdj_05_enus

Getting Started with Hive: Optimizing Query Executions with Partitioning

Course Overview
use the Google Cloud Platform's Dataproc service to provision a Hadoop cluster. Not required if you already have a Hadoop environment set up with Hive
define a table which will contain data partitioned based on the value in one of its columns
insert data into partitions of a Hive table and explore the partition and its data on HDFS
load data into table partitions from files
create and populate partitions in an external table
alter the definition of a partition to modify its contents
define and work with dynamic partitions on your Hive tables
configure a table to use more than one column to define partitions and explore the partition on HDFS
use partitioning to boost query performance in HDFS

Target

Prerequisites: none

Getting Started with Hive: Viewing and Querying Complex Data

Course Number:
it_dsgshvdj_03_enus

Getting Started with Hive: Viewing and Querying Complex Data

Course Overview
load and access data in the form of arrays
work with data in the form of key-value pairs - map data structures in Hive
define and use structured data in the form of Hive struct types
transform complex data types to a tabular format to facilitate analysis using the explode and posexplode functions
combine the results of the explode function with other columns of a table to generate a lateral view
flatten multi-dimensional data structures by chaining lateral views
use the UNION and UNION ALL operations on table data and distinguish between the two
search for values in the results of a subquery using the IN and EXIST clauses
create and load data into tables efficiently by including these operations in a single query
define and work with views in Hive to simplify querying and control access to data
perform queries and utilize views on complex data types available in Hive

Learners explore working with complex data types in Apache Hive in this Skillsoft Aspire course, which assumes previous work with Hive tables using the Hive query language, and comfort using a command-line interface or Hive client to run queries. Learners begin this 12-video, hands-on course by working with Hive tables whose columns are of complex data types (arrays, maps, and structs). Watch demonstrations of set operations and transforming complex types into tabular form with explode operation. Then use lateral views to add more data to exploded outputs. Course labs use the Beeline client; the instructor’s Beeline terminal runs on the master node of a Hadoop cluster, provisioned on Google Cloud platform using its Dataproc service, and learner access is assumed to a Hadoop cluster and Beeline, on-premises or in the cloud. Finally, learners observe how to use views to aggregate contents of multiple columns. As the course concludes, you should be comfortable working with all types of data in Hive and performing analysis tasks on tables with both parameter types as well as complex data.

Target

Prerequisites: none

Getting Started with Hive: Viewing and Querying Complex Data

Course Number:
it_dsgshvdj_03_enus

Getting Started with Hive: Viewing and Querying Complex Data

Course Overview
load and access data in the form of arrays
work with data in the form of key-value pairs - map data structures in Hive
define and use structured data in the form of Hive struct types
transform complex data types to a tabular format to facilitate analysis using the explode and posexplode functions
combine the results of the explode function with other columns of a table to generate a lateral view
flatten multi-dimensional data structures by chaining lateral views
use the UNION and UNION ALL operations on table data and distinguish between the two
search for values in the results of a subquery using the IN and EXIST clauses
create and load data into tables efficiently by including these operations in a single query
define and work with views in Hive to simplify querying and control access to data
perform queries and utilize views on complex data types available in Hive

Target

Prerequisites: none

Getting Started with Hive: Viewing and Querying Complex Data

Course Number:
it_dsgshvdj_03_enus

Getting Started with Hive: Viewing and Querying Complex Data

Course Overview
load and access data in the form of arrays
work with data in the form of key-value pairs - map data structures in Hive
define and use structured data in the form of Hive struct types
transform complex data types to a tabular format to facilitate analysis using the explode and posexplode functions
combine the results of the explode function with other columns of a table to generate a lateral view
flatten multi-dimensional data structures by chaining lateral views
use the UNION and UNION ALL operations on table data and distinguish between the two
search for values in the results of a subquery using the IN and EXIST clauses
create and load data into tables efficiently by including these operations in a single query
define and work with views in Hive to simplify querying and control access to data
perform queries and utilize views on complex data types available in Hive

Target

Prerequisites: none

Getting Started with Hive: Viewing and Querying Complex Data

Course Number:
it_dsgshvdj_03_enus

Getting Started with Hive: Viewing and Querying Complex Data

Course Overview
load and access data in the form of arrays
work with data in the form of key-value pairs - map data structures in Hive
define and use structured data in the form of Hive struct types
transform complex data types to a tabular format to facilitate analysis using the explode and posexplode functions
combine the results of the explode function with other columns of a table to generate a lateral view
flatten multi-dimensional data structures by chaining lateral views
use the UNION and UNION ALL operations on table data and distinguish between the two
search for values in the results of a subquery using the IN and EXIST clauses
create and load data into tables efficiently by including these operations in a single query
define and work with views in Hive to simplify querying and control access to data
perform queries and utilize views on complex data types available in Hive